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Abstract 

Although already William James and, more explicitly, Donald Hebb's theory of cell assemblies have suggested that activity- 
dependent rewiring of neuronal networl<s is the substrate of learning and memory, over the last six decades most 
theoretical work on memory has focused on plasticity of existing synapses in prewired networks. Research in the last decade 
has emphasized that structural modification of synaptic connectivity is common in the adult brain and tightly correlated 
with learning and memory. Here we present a parsimonious computational model for learning by structural plasticity. The 
basic modeling units are "potential synapses" defined as locations in the network where synapses can potentially grow to 
connect two neurons. This model generalizes well-known previous models for associative learning based on weight 
plasticity. Therefore, existing theory can be applied to analyze how many memories and how much information structural 
plasticity can store in a synapse. Surprisingly, we find that structural plasticity largely outperforms weight plasticity and can 
achieve a much higher storage capacity per synapse. The effect of structural plasticity on the structure of sparsely 
connected networks is quite intuitive: Structural plasticity increases the "effectual network connectivity", that is, the 
network wiring that specifically supports storage and recall of the memories. Further, this model of structural plasticity 
produces gradients of effectual connectivity in the course of learning, thereby explaining various cognitive phenomena 
including graded amnesia, catastrophic forgetting, and the spacing effect. 
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Introduction 

Traditionally, learning and memory are attributed to weight 
plasticity, that is, the modification of the strength of existing 
synapses according to variants of the Hebb rule [1-5]. Although 
the theory of weight plasticity has been crucially important in 
neuroscience and applications of artificial neural networks, it 
could not easily explain various fundamental memory-related 
efiects in cognitive psychology such as graded amnesia, 
the prevention of catastrophic forgetting, and the spacing 
effect. 

Another form of synaptic plasticity is structural plasticity, that is, 
the creation and erasure of synapses [6-13]. Originally thought of 
setting up connectivity during development [14-16] or after 
injuries [17,18], it has recendy been shown to correlate with 
memory formation and learning in the healthy adult brain [19- 
23]. 

Here we introduce and analyze a simple computational model 
of structural plasticity which exhibits surprisingly high memory 
capacity and is able to explain the mentioned cognitive effects. A 
key to understanding the role of structural plasticity in memory 
has to do with the observation that the brain, even its most 
densely connected local circuits, is far from being fully connected 
[24,25]. Thus, for any given network computation, the existing 



synapses may or may not provide the optimal structure of the 
network. To assess the match between existing synapses and the 
synapses required by a computation, we define effectual connectivity 
as the fraction of required synapses that are present in the 
network. By erasure and creation of synapses, structural plasticity 
can "migrate" synapses and thereby increase the effectual 
connectivity for a given network function. By integrating our 
model with well-known Hopfield- or Willshaw-type neural 
network models of memory storage and retrieval [16,26,27] we 
can quantitatively asses the benefits of structural plasticity 
compared to weight plasticity. In section 0.6 we show that 
ongoing structural plasticity can strongly increase storage 
capacity for sparsely connected networks, which is in line with 
related approaches counting possible synaptic network configu- 
rations [28-30] or analyzing storage capacity for structural 
plasticity during development [15,16]. Moreover, our theory of 
structural plasticity suggests immediate explanations for various 
memory phenomena [31-33]. In particular, in section 7 we 
analyze the role of structural synaptic plasticity in cortico- 
hippocampal memory replay and consolidation [34,35], prevent- 
ing catastrophic forgetting in brains [36,37], graded retrograde 
amnesia following brain lesions [38—40], and the pedagogically 
relevant spacing effect of learning [41-43]. 
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Concepts and Models 

1 Synapse Ensembles and Effectual Connectivity 

Common memory theories based on neural associative network 
models consider only Hebbian-type weight plasticity in networks 
with fixed structure, thus, neglecting processes involving structural 
plasticity. Such models predict that the maximal information that 
can be stored in a given neural network increases in proportion to 
the number of synaptic connections rather than number of 
neurons. Therefore, storage capacity C is often expressed in terms of 
stored information per synapse. For example, C = 0.69 bit per 
synapse (bps) for networks of binary synapses [26,44], or C = 0.72 
bps for real-valued synaptic weights [45,46] . To judge how many 
memories can be stored in a network W connecting two neuron 
populations it and v each comprising n neurons, it is therefore 
important to know the anatomical network connectivity 

# synaptic connections 



defined as the chance that there is a synaptic connection between 
two randomly chosen neurons (Fig. lA). 

For memory theories including structural plasticity the situation 
is different because we can assume that processes including 
generation of new synapses, consolidation of useful synapses, 
elimination of useless synapses, and maintenance of anatomical 
connectivity at a given level P wUl effectively "migrate" synapses 
to locations that are most appropriate for storing a particular set of 
memories. Evidendy, anatomical connectivity will then be a bad 
predictor of storage capacity. Rather storage capacity will depend 
crucially on the number of locations where a synapse could 
potentially be generated. Such locations have been called potential 
synapses [29], v/here potential network connectivity 



# potential synaptic connections 



pot 



(2) 



is the chance that there is a potential synapse between two 
neurons. 

It is now tempting to apply the old memory theories for weight 
plasticity as well to structurally plastic networks by simply 
replacing P by Ppot. The underlying argument is that the 
structurally plastic network with potential connectivity Ppot would 
be functionally equivalent to a structurally static network with 
anatomical connectivity at the same level Ppot because real 
synapses could "migrate" to any one of the PpotW^ potential 
locations. Such an approach would be valid only if the number of 
required synapses does not exceed the number of actual synapses, 
Pn^. However, the question which or how many synapses are 
actually necessary for storing a particular memory set is usually 
neglected by theories for fixed networks without structural 
plasticity. Moreover, from such theories it is impossible to infer 
any temporal dynamics of structural modifications during memory 
formation. 

We therefore have to introduce another type of connectivity 
measure that specifies how many synapses have actually been 
formed at time / between neurons that belong to a particular 
memory representation. More generally, we can specify the synapse 
ensemble requested to support storage of a memory set 3Jt by a 
« X n matrix S. In the simplest case S is binary where non-zero 
matrix entries with 5,/= 1 "tag" potential synapses from neuron / 
to j that need to be realized or consolidated for storing the 
memories 9Jt (Fig. IB). Then with W being the « x « matrix of 
actual synaptic weights (with Wy = 0 if there is no real synapse 
from / to j), we define the effectual connectivity of memories SPt as the 
"overlap" of actual and requested synaptic weights, for example, 



A anatomical connectivity P = = 13/49 
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potential connectivity Ppot = 2 ~ 26/49 
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Figure 1. Definitions of network connectivity. Illustration of different connectivity measures for a synaptic network W connecting neuron 
populations u to i' (which may be identical for recurrent networks). A, Anatomical connectivity P and potential connectivity P^ax are fractions of neuron 
pairs (i(,,v,) connected by an actual (black circles) and potential synapse (blue rectangles), respectively. B, The consolidation signal Sij specifies the 
ensemble of neuron pairs that request a synapse (S,, ^!, red circles) to support storage of a given memory set. The corresponding effectual 
connectivity P^{( is then the fraction of neuron pairs requesting a synapse that are already connected by an actual synapse. The consolidation load P\s 
is the fraction of neuron pairs that request a synapse. 
doi:1 0.1 371/journal.pone.0096485.g001 
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Peff 



En s-^n 



En 
/=1 Z^/ = 



1 '^!7 



(3) 



for binary synaptic weights witli f?^i;e{0,l} (Fig. IB). For real- 
valued weights one could generalize this definition (e.g., 

E;^iE;^imin(|P^,|.|^,y|) 
i'eff := ^ — ^ — j-^r-j where may be either 

2^1=1 l^j=i l-^f/l 

binary or real-valued, specifying the "desired" synaptic weight). 
It is obviously 0 < Pgff ^ 1 and, for eq. 3, efiectual connectivity Peff 
corresponds simply to the probability that a requested synapse is 
actually realized and potentiated {Wij = l). We call the matrix S 
also learning signal or consolidation signal because it specifies which 
synapses should be potentiated or stabilized during memory 
consolidation. For example, simple Hebbian consolidation signals 
can be based on the correlations between presynaptic and 
postsynaptic spike activity (see next section). Such S could be 
provided either by repeated bottom-up stimulus presentation or, in 
the case of episodic memory, by replay from a hippocampus-like 
short-term memory buffer (Fig. 2B— D). The fraction of non-zero 
entries in S is called the consolidation load Pis- In larger networks it 
is typically P<Peff <Ppot if locations of requested synapses S are 
uncorrelated to the (initial) locations of potential and actual 
synapses. Our main hypothesis is that the primary function of 
structural plasticity is to adapt network structure to the particular 
memories to be stored. This process corresponds to an increase in 
effectual connectivity Peff from the level of anatomical connec- 
tivity P towards the level of potential connectivity Ppot which 
increases storage capacity per synapse as well as space and energy 
efficiency of the network [47-49] . 

2 Model of Structural Plasticity and Consolidation 

Figure 2A illustrates a minimal state model for a "potential" 
synapse. Here a potential synapse if is the possible location of a 
real synapse connecting neuron i to neuron j, for example, a 
cortical location where axonal and dendritic branches of neurons / 
and j are close enough to allow the formation of a novel 
connection by spine growth and synaptogenesis [29]. As dendrites 



and axons may closely overlap at multiple locations, in general, 
there may be multiple potential synapses (v = 1 ,2, . . .) between a 
neuron pair ij. Our minimal model has only three states: A 
synapse can be either potential but not yet realized (state n), 
realized but silent (state and weight 0), or realized and 
consoUdated (state and weight I). For real synapses, state 
transitions are modulated by the consolidation signal j = Si/. 

Then structural plasticity means the transition processes between 
states n and 0 described by transition probabilities 
: = pr[state(?+ l) = 0|state(z) = 7t] and /ipij : = pr[state(r-(- 1) 
= 7i|state(/) = 0,5',/ =5]. Similarly, weight plasticity means the 
transitions between states 0 and I described by Pc\s ■ = 
pr[state(/+I)=l|state(0 = 0,5'i,=5] and p^, : = pr[state(?+ 1) 
= 0|state(?) = 1,5;/ = In accordance with the diagram of 
Fig. 2A, the evolution of synaptic states can tiien be described 
by probabilities /istateCO that a given potential synapse is in a 
certain state e{7i,0,l} at time step / = 0,1,2, . . ., 



Pi(0 = (l -Pdm)Pi(t- i)+PcW)Po(t- 1) 



Po(0 = (i -PcMt)-Pem)Poit-'i)+PdmPi(t- ^)+PgPK(t- 1) 



Pn(t) = (i-Pg)Pn{t-l)+Pels(t)Po(t-y)=l-pi(t)-po{t) , (4) 

where the (Hebbian) consolidation signal s(t) = Sij(t) may depend 
on time. Note that we assume Pg to be independent of ^ because it 
is unclear how to provide Sij with high spatial precision ij to not 
yet realized potential synapses. Instead, Pg may rather be under 
the control of homeostatic mechanisms to keep the number of 
synapses or the resulting mean fu'ing rates of a neuron at a desired 
level [50]. The model could easily be extended towards more 
biological realism by additional state transitions (e.g., from 1 to 7i 
[51]), a cascade of further synaptic states [52], or graded synaptic 
weights [53,54], but here the focus is on the essential properties of 
the interplay between structural and weight plasticity. 

For the microscopic simulations of individual synapses as 
displayed in Figs. 4 and 6 we have used the Felix-H- simulation 
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Figure 2. Model of structural plasticity and consolidation. A, State/transition model of a single potential synapse (see text for details). B, In 
the following we consider potential synapses in a network PV, for example, connecting two cortical neuron populations ii and v. Memories 
correspond to associations between activity patterns and i'''. We will specifically analyze how well noisy activity patterns w'' can reactivate tfie 
corresponding memories v'' in order to estimate storage capacity. C, D: LTM storage (solid) by structural plasticity requires repetitive reactivation of 
activity patterns in cortical populations ii and v to provide an appropriate consolidation signal S to the synapses. This may happen by repeated 
bottom-up stimulation (D) or, for episodic memories, by top-down replay (C) from a HC-type STM buffer (dashed). LTM = long-term memory; 
STM = short-term memory; HC = fiippocampus. 
doi:1 0.1 371/journal.pone.0096485.g002 
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Figure 3. Learning in Wiilsiiaw-type associative networics. A, Memory storage by Hebbian weight plasticity (Eq. 5) in a fully connected 
network 1). Address patterns are associated to content patterns v'' where /i= I, . . . ,M (here M = 2). Each memory is represented by a binary 
activity vector of length « = 7 having /c = 4 active units (which define the corresponding cell assembly). B, One-step retrieval of the first memory from 
a noisy query pattern iixu' having two of the four active units in w' (/ = 0.5). Here can perfectly reactivate the corresponding memory pattern 
in population r (v= v') applying a firing threshold 0= J^, w, = 2 on dendritic potentials Ay = Y17=\ C, As a simple form of structural plasticity, 

silent synapses can be pruned after learning. The resulting network has only 28 (instead of 49) synapses corresponding to a lower anatomical 
connectivity _Ps;0.57, whereas the effectual connectivity is still /'crr = l. Thus, pruning does not change network function, but increases stored 
information per synapse. D, Ongoing structural plasticity can similarly increase storage capacity during more realistic learning in networks with low 
anatomical connectivity (here /" = 28/49 * 0.57). During each time step t= 1,2,3,4, Hebbian weight plasticity potentiates and consolidates synapses ij 
with non-zero consolidation signal >0 (which equals Wij of panel A), whereas the remaining silent synapses are eliminated and replaced by new 
synapses at random locations. Note that the resulting network at t = A is the same as in panel C. 
doi:10.1371/journal.pone.0096485.g003 



tool [55] to implement large networks with many 
potential synapses and to simulate network evolution by 
random sampling of synaptic state variables in discrete time steps. 
A simple match of the simulation time scale to physiological data 
can be obtained from the mean lifetime of unconsolidated 
unrequested synapses: For pe\i) > 0 the mean lifetime is 

J2T=o 'Pf|o(l -Pi-\o)'' ' = --^^ V" ?(1 -p^io)' = — simula- 

1— /"elO Pe\0 

tion steps. This may be compared, for example, to the few days 
lifetime reported for unstable spines in adult animals [10]. 

On the network level we use corresponding macroscopic variables 
, Pq , and P^' defined as the fraction of neuron parrs that have 
a potential synapse in a certain state and receive a certain 
consolidation signal s. From this we can derive the connectivity 
variables defined in the previous section, in particular, 

P : = E.,'P*i''+'Po' and Peff = ^'i'V-Pis for binary s (see Sect. 
Mathematical Analysis I for details). In most simulations of 
(adult) memory processes (Figs. 4,3D,6), we have assumed that the 



rates of synapse generation and elimination are in homeostatic 
balance to maintain either a constant anatomical network 
connectivity P or a constant number Pn^ of actual synapses. 

The relation between synapse and network variables is non- 
trivial in general because there may be multiple potential synapses 
V = 1 ,2, . . . per neuron pair ij (see Sect. Mathematical Analysis 
LI), for example around 5-10 between two connected neighboring 
cortical neurons [56-60]. Nevertheless, we argue that even our 
simple binary model with only a single synapse per connected 
neuron pair bears significant biological relevance because it has 
been reported that the number of actual synapses per connected 
neuron parr and also the total synaptic weight is surprisingly 
similar across neurons (see discussion section; cf [59,61]). 
Therefore, we have analyzed this simple model to obtain the 
results presented below and in Section 6 (see Figs. 4—5). To 
improve biological realism of our simulation experiments in 
Section 7 (Fig. 6), we have tested our ideas also with a second 
model variant that allows multiple synapses per neuron parr, where 
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Figure 4. Increase of effectual connectivity during memory consolidation with ongoing structural plasticity. Each curve shows the 
evolution of effectual connectivity F^if as a function of time / for different parameters F (anatomical connectivity), Ppoi (potential connectivity), Pis 
(consolidation load), and Pi{0) (fraction of initially consolidated synapses). Data are from single microscopic network simulations (solid black; cf. Eq. 4; 
network size « = 1000) and macroscopic theory (dashed gray; Eq. 11). See Table 1 for further simulation parameters. A: PctfiJ) for different 
consolidation loads Fis and constant F = 0.1, Fpot = 1. Pi(0) = 0. B: F^(r(t) for different fractions of initially consolidated synapses Fi(0) and constant 
P = 0.1, fpot = l, -Pi5 = 0.01. C: Pcrr(f) for different anatomical connectivities F and constant Pi(0) = 0.1, /'pot = l, -Pis = 0.001. 
doi:1 0.1 371 /journal.pone.0096485.g004 



each of the Pn^ actual synapses of the network can be allocated to 
one of the Ppot«^ potential locations independently of other 
synapses. Additional simulations (not shown) have indicated that 
both model variants yield qualitatively very similar results unless 
the replay time for a given consolidation signal was very long. 
Then the second model variant tended to accumulate all available 
synapses at the locations specified by the consolidation signal such 
that neuron pairs were connected by a large number of synapses. 

3 Models for Memory Storage and Retrieval 

The model presented so far is of general relevance for any 
neural theory of memory, because it is independent of any specific 
mechanisms for memory storage and retrieval: Any learning and 
storing mechanisms are only implicitly conveyed by the learning 
signal S that "tags" potential synapses for later consolidation. 
Similarly, memory recall is not directly described in the model so 
far. Rather, our theory describes effectual connectivity Petf which 
is closely lirrked to retrieval performance for a given memory set. 
To explain this link and to allow a more quantitative performance 
evaluation, the next section instantiates and analyzes our model 
within a common neural network framework of memory storage 
and recall. 



A particularly simple memory model based on Hebbian 
learning of binary synapses is the Steinbuch or Willshaiv model 
[26,44,62] . In the general hetero-associative setup (Fig. 3A), memories 
correspond to binary spike activity vectors and stored in a 
synaptic connection W hnking two neuron populations u and v. 
By choosing the auto-associative setup with identical u and v, the 
WHlshaw model can be apphed as well to model memory processes 
in local recurrent connections (cf Fig. 2B). The average number k 
of one-entries in an activity vector is called pattern activity and 
corresponds to the mean size of local Hebbian cell assemblies in 
populations u and v. After storing a set of M memory associations in 
a network without structural plasticity, the weight of an actual 
synapse connecting neuron M, to neuron Vj is 

W.j=mm{l,Y,^^^u';-v';)e{0,l}. (5) 

Note that a synapse in the WiUshaw model is actually a 
special case of our model of a potential synapse because Eq. 5 
instantiates Eq. 4 for Sjj=Wij, P = P-pot = Pe(i, Pc\\ = ^, and 

Pg =Pe\s =Pc\0 =Pd\s = 0. 



A patte n capacity ; n=10 ^ JJ weight capacity C"''; n=10^ C total synaptic capacity C'°'; n=10^ 




Figure 5. Storage capacities for a finite Willshaw network having the size of a cortical macrocolumn (n= 100000). A, Contour plot of 
pattern capacity (number of stored memories) as a function of assembly size k (number of active units in a memory vector) and effectual network 
connectivity fctr assuming output noise level £ = 0.01 and noise-free input patterns (/i= 1, ;c = 0). B, Weight capacity C^p for the same setting as in 
panel A. C, Total storage capacity including structural plasticity for the same setting as in A. Note that even modest increases of F^tf can strongly 
increase storage capacity, in particular for sparse neural activity (small k) [82]. All data computed from Gaussian approximation of dendritic potential 
distributions (see appendix II. 2). 
doi:1 0.1 371/journal.pone.0096485.g005 
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Figure 6. Simulation of catastrophic forgetting, Ribot gradients, and the spacing effect. A, Networks without structural plasticity suffer 
from catastrophic forgetting (top), but networks with structural plasticity do not (bottom). Plots show output noise i over time t simulating networks 
of size « = 1000 and activity k = 5t) storing 25 memory blocks one after the other (only the interesting part between storage of blocks 6 and 21 are 
visible). Each curve (with a distinct color) corresponds to i for noisy test patterns of a particular memory block with c = 45 correct and / = 5 false 
active units. The steep descent of each curve corresponds to the time when the Hippocampus started to replay the corresponding memory block for 
5 time steps. B, Networks employing structural plasticity show Ribot gradients after a cortical lesion (top) due to gradients in effectual connectivity 
(bottom). The lesion was simulated by deactivating half of the neurons in population u at time / = 20. C, Networks employing structural plasticity 
reproduce the spacing effect of learning. In the first simulation (blue) novel memories were rehearsed once for 20 time steps (blue arrow at 
f = 0— 19). In a second simulation (red) the same total rehearsal time was "spaced" or distributed to four brief intervals of five steps each (red arrows 
at ( = 0 — 4, ^= 100— 104, / = 200 — 204, and / = 300-304). Here the network achieves a higher effectual connectivity Pcff (bottom) and less retrieval 
noise e (top). See Sections 2, 3 and Table 1 for further details and simulation parameters. 
doi:1 0.1 371/journal.pone.0096485.g006 



Memory retrieval means the re-activation of a previously stored 
content pattern in neuron population v following the activation 
of a (noisy) address pattern in population u. The simplest 
retrieval procedure is "one-step retrieval" with adaptive threshold 
control [63]. Specifically, an input pattern ii is propagated 
synchronously from population u to population v as illustrated in 
Fig. 3B. Then dendritic potentials of the neurons in population v 
are given by simple vector-matrrx-multiplication, x : =u^ W , and 
the retrieval output v is obtained from x by applying a vector of 
spike thresholds 0, 



=(E"i",n^)>0/ 

otherwise 



(6) 



where © is chosen to obtain close to k active units in Vj. We can 
then evaluate retrieval quality by estimating the output noise level 



(n — k)qoi +kqi. 



(7) 



defined as the mean Hamming distance 
dH(^,y'^) '■=Y^"j^\\vj — v'^\ between retrieval output v and the 
original memory v'' normalized to the cell assembly size k. Here 



qQ\ : = pr[v, = l]vj' = 0] and gio : = pr[v/ = 0|vj-' = 1] are component 
error probabilities. Similarly, we can defme input noise 
£ : = ({n — k)pQi+kpiQ)/k as the normalized Hamming distance 
between input pattern u and the original address memory u''. We 
will also express input noise in terms of parameters 1 :=l —pio 
(completeness) and k ■.=poi{n — k)/k (add noise). 

We have used one-step retrieval for some of our experiments 
(Fig. 5) because it is most easy to analyze, for example, for 
estimating the memory capacity of a single network (see below). 
However, for the investigation of memory phenomena, there exist 
more realistic retrieval methods that are based on spiking neurons 
and iterative (gamma range) oscillatory activity propagation 
[64,65]. As such models are computationally very demanding, in 
particular, when simulating longer time intervals in the range of 
months to years, it is more favorable to use simple iterative 
extensions of one-step retrieval [27,63,66,67]) that can still mimic 
many relevant properties of the realistic models. 

In particular, iterative retrieval avoids the most serious 
limitation of one-step retrieval, that is, the lack of a sufficient 
attractor behavior: High output noise after one-step retrieval does 
not exclude perfect retrieval after iterated retrieval steps. In fact, as 
long as the output noise level after the first step is smaller than the 
input noise level, the iterative retrieval procedure is likely to reduce 
output noise to zero in subsequent retrieval steps. As a 
consequence, for individual memories, the relation between input 
and output noise will be much steeper if using the iterative models: 
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A memory pattern can be retrieved either perfectly or the number 
of component errors is very high. Still, one-step retrieval is useful 
by providing lower bounds (because of its suboptimality) and 
upper bounds (assuming zero input noise) of the true storage 
capacity. 

For our long-term simulations of memory phenomena (Fig. 6) 
we have therefore extended the WiUshaw model in two ways: First, 
similar as illustrated by Fig. 2B, we have included also WiUshaw- 
type auto-associative connections in addition to the hetero- 
associative link from M to v in order to account for the rich 
recurrent connectivity of cortex and to enable iterative refinement 
of retrieval outputs. Second, we have implemented an iterative 
retrieval procedure as follows (cf [63]): In an initial step, the input 
pattern ii is propagated through the hetero-associative connections 
from u to population v, in which the k neurons with the largest 
dendritic potentials become active, resulting in a preliminary 
retrieval result v^''^ In similar further steps, this preliminary result 
was then iteratively propagated through the auto-associative 
network of population v yielding refined retrieval outputs v*^'' for 
j= 1,2, . . . (where aU recurrent connections to u were inactivated). 
Typically, a small number of iterations was sufficient to obtain 
stable outputs. For evaluation of output noise e we used the activity 
pattern v*^'' after 3 iterations and compared it to the original 
memory pattern v'' to estimate component error probabilities ^oi 
and (see Eq. 7). 

For the simulations involving structurally plastic networks and 
long-term consolidation (Fig. 6) we have divided the overall 
memory set into multiple blocks jS=l,2, ... each containing 
several individual memory patterns. Each memory block defines a 
consolidation signal that is identical to the WiUshaw matrix 
(Efj.5) obtained from the corresponding subset of memories. Thus, 
memory blocks are consolidated one after the other, each for a 
certain number of simulation steps, by reactivating the corre- 
sponding activity patterns in populations u and v to mimic either 
hippocampal short-term storage and top-down replay (Fig. 2B,C) 
or repeated bottom-up rehearsal of the corresponding memories 
(Fig. 2B,D). Fig. 6 shows simulations with structural plasticity in 
the connection W linking u to v. By contrast, the recurrent 
connections within u and v were prewired without any structural 
plasticity and auto-associatively stored the individual patterns m** 
and with a fixed connectivity (-P = 1 for Fig. 6A, upper panel; 
P = 0.2 for Fig. 6A, lower panel; P = 0.1 for Fig. 6B,C). Table 1 
summarizes the remaining simulation parameters. 

4 Definitions of Storage Capacity 

The storage capacity is the amount of information (in bits) that a 
neural network can store (and retrieve) per synapse. There are two 
contributions to the total capacity C" of a synapse. 



information normalized to the number of synapses in a static 
network (no structural plasticity) with connectivity P, 



: = max{M : e < e} 



: = 



M^T{k/n,qouqio) 
Pn 



(9) 



(10) 



where T{q;q(f\,q\^ is the transinformation (or mutual informa- 
tion) when transmitting independent memory components v^" (with 
^ : = pr[vj' = 1] =^/«) over a binary channel (with transition 
probabilities qQ\ and q\o as in Eq. 7) and receiving Vy (for details 
see appendix A in [16]). In general, it is difficult to disentangle the 
two contributions C*P and C*P. Thus, in the results section we wiU 
compute the total capacity C"" for some special cases. 

Results 

5 Structural Plasticity Increases Effectual Connectivity 

In the previous section we have introduced effectual connec- 
tivity Peff as a measure of how well a given set of memories is 
stored in a synaptic network. Without any structural changes of the 
network, Peff will obviously remain constant, for example, at the 
level of anatomical connectivity P for novel memories that do not 
correlate with the current network structure. It is therefore more 
interesting to investigate the dynamics of Prff during phases of 
ongoing structural plasticity. For consistency with experimental 
observations it seems most reasonable to focus on a parameter 
range where structural plasticity operates on a slower time scale 
than Hebbian-type weight plasticity (/'c|o«Pc|l), but on a faster 
time scale than the lifetime of stable consolidated synapses 

(Pc\{)»Pd\\)- 

It is indeed possible to analyze our model in such a parameter 
regime: In Sect. Mathematical Analysis 1.2 we compute the 
temporal evolution of effectual connectivity during consolidation 
of a novel memory set under the following simplifying assump- 
tions: 1) Large networks with «»1 such that all macroscopic 

variables i^tate close to their means; 2) at most a single synapse 
per neuron pair; 3) binary consolidation signal i6{0,l}; 4) new 
memories specified by S are independent of initial network 
structure and any old memories; 5) immediate consolidation with 
Pc\s = s; 6) pd\\ =Pe\\ =0; 7) Pg and Pe\<i in homeostatic balance such 
that P{t) is constant. Then effectual connectivity for a new set of 
memories increases from Peff (0) = Pi (0) before any learning starts 
to 



C'°'<C"P-I-C'P. 



(8) 



First, the weight capcuity is the information stored by 

modification of the synaptic weight for a fixed network structure, 
(a more general definition could as well include any other 
modifications of synaptic state variables such as synaptic transmis- 
sion delay). Second, the structural capacity C^P is the information 
stored by selecting an appropriate target location for a synapse 
with fixed weight. We would Kke to evaluate storage capacity at a 
limited small output noise level e (see Eq. 7): The "stored 
information" can then be computed from the pattern capacity Mc 
defined as the maximum number of memories that can be stored 
at noise level £, whereas the voeight capacity C^P is the stored 



Pes{t)~Ppoi-{Pvox-P)m;Li 

P-{\-P,s){\ -PdioY^Pm-PisPMh) 



1+Pe|0- 



pot 



(Ppot-P)e 



Ppot~P 



0) 



(11) 



assuming that S is provided at each time step t=l,2,... (e.g., 
by memory replay) and Pi(0) : =Pj''' + Pj'' is the fraction of 
initially consolidated synapses (corresponding to old memories). 
The second approximation additionally presumes Pis«l and 
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o Si 

1- Q. 



o Pd\o « 1 ■ Thus, convergence of Peff towards Ppot requires 

|^|| Pls<P/Ppot (for pdio>0) or Pis<(P-Pi(0))/(Ppot-Pi(0)) 

^ J o E (for pdio = 0). Also note that during the first consohdation step there 

S .| £ is a quick increase from Peff (0) = Pi (0) to Peff (1) = P followed by 

^ ™ o 2 a much slower increase towards Ppot in the subsequent steps. 

Q. "I £ Section 7.1 relates this behavior to the spacing effect as a possible 

.S~ >, o i explanation why several brief learning sessions are generally more 

G c ■- 5 effective than a single long session. 

i ^ Figure 4 shows that the approximations accurately predict 

^ 2 ^ * microscopic model simulations. Consolidation becomes slower for 



(U c o <1J 



larger consolidation loads Pis which limits maximal storage 
capacity (panel A; see Section 6). Similarly, consolidation becomes 
slower for increasing fractions Pi(0) of initially consolidated 
g E S synapses (panel B). As Pi will correlate with the number of 

11'^^ previously consolidated memories and, thus, with age, this implies 

2 -2 "3 that memory consolidation should be faster in young compared to 

old subjects, even if the anatomical connectivity P would be 
constant over lifetime. Moreover, the corresponding gradients in 
5 S. ^ ^ Peff resulting after a fixed number of consohdation steps can be 

=^1 1- ^ i related to gradients in memory performance in graded retrograde 

g .c S amnesia (Section 7.2) and the absence of catastrophic forgetting 

2 6 o li "J (Section 7.1). Finally, panel C shows that even slight increases in 

anatomical connectivity (as reported after learning new concepts 
or tasks [68]; cf Fig. 7) can strongly speed-up memory 
consolidation if a large proportion of synapses are in the 
consoKdated state (as expected for adult networks after synaptic 
^ i'l c i. pruning [14,15]). 

Our analysis and further simulations (data not shown) reveal 
^ ^ ^ ^ that the described increase of Peff is very stable and occurs for 

^ g- S virtual any plausible configuration of model parameters. Before we 

discuss the mentioned memory phenomena in more detail, the 
following shows that, by increasing Peff, structural plasticity can 
- ^ 2^ ^ * store much more information per synapse than Hebbian-type 

=i s .!= ra ™ weight plasticity. 

1 a,"^ S- c # 



fu c ■ ■ 

I s I i ° 

£ 



o ^ 



? £ ° 



1|' 



6 How Much Information can a Synapse Store? 

=5 S S g ^ It is a well-known result of information theory [69] that 

? Eij optimally coding an entity taken at random from a set of n 

different entities takes Id n bits of information [69] (where 
>, o o Id : = log2). From this we can derive simple upper bounds for 

the maximal information that a synapse can store by counting the 
number of possible synaptic states, i.e. the number of possible 
weights and locations, that can be realized by weight plasticity and 
is?-? o structural plasticity, respectively. The resulting upper bounds for 



a, 



E 

c -a m 
?r .9- a E 



I ^ ■§ ^ 

s a £ S 

3 2 



< ^ I i I weight capacity C™p and structural capacity C^P are 



^§cE.S C"P<ldiV and eP<ld«, (12) 

' ^ o -■ " 

.£ s £ ■ 



£ I 



assuming that weight plasticity can choose one out of N possible 
discrete weights for an individual synapse, and structural plasticity 
can choose between n targets where to grow a novel synapse, 
.g^ o ^ m These bounds could trivially be reached by an ideal observer that 

I OJ § "I ? ^ has direct access to synaptic attributes (i.e., weights and locations). 

K 5 K S However, here we are rather interested in how much information 

£ <ij =s o a synaptic network can store and safely retrieve employing 

^ -4^ Q. 

Si £ biologically plausible mechanisms. In particular, we have to 

'(u -S- o I measure the amount of retrieved information from plausible 

f -| neural output variables such as spikes or mean firing rates. For this 

^ o !^ it is necessary to link our theory to concrete neural network models 

I ■§ ^ " ? of memory storage and retrieval, such as WUlshaw and Hopfield- 
type models ([26,27,45,70,71]; see section 3). 



;n <u 
u^ fD C C 
C g >^ O 
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Our theory yields the surprising result that the weight capacity 
C™P in the brain might actually be negligible compared to 
structural capacity C"''. First, it is well understood that weight 
capacity of biologically plausible memory models is limited by 
hard theoretical bounds suggesting C™P<0.72 bit per synapse 
even for an infinite computing precision with A^— >co 
[27,45,46,72,73]. Second, due to noisy transmission characteristics 
and various adaptation mechanisms, real synapses are likely to 
have a rather small number of functionally distinctive states, 
perhaps A'^ being on the order of ten or even binary [74-76]. 
Third, unlike A'^, the number of potential targets n may actually be 
very large in the brain: For example, for a cortical neuron n is on 
the order of 10^ corresponding to the number of neighboring cells 
within the same macrocolumn [24] , and the number of targets n 
may be even much larger because each neuron may have a large 
number of functionally distinct dendritic compartments [28]. 
Fourth, it has been recently shown that the upper bound of 
structural capacity can be tightly reached for synaptic pruning 
following learning in completely connected networks [16,53]. 

Before generalizing these results to ongoing structural plasticity 
in sparsely connected networks, let us first re-analyze the classical 
WUlshaw model (without structural plasticity) as illustrated in 
Fig. 3A,B. There, synaptic weight plasticity follows a simple binary 
Hebbian rule (Eq. 5). Due to Pd\s=^ (cf Eq. 4) the fraction of 
consolidated synapses p\ increases monotonically with M until it 
reaches a maximal value pu beyond which the output noise e 
exceeds the tolerable level e. Some theory presented in Sect. 
Mathematical Analysis II. 1 shows that the corresponding pattern 
capacity Mf crucially depends on px^: For networks of size n, 
randomly generated cell assemblies of size k, and input noise with 
/e(0,l] and k = 0, it is (see text below Eq. 28 in Sect. 
Mathematical Analysis II. 1 ) 



Pu{P<:a)K max 



0, 



n — k 



e(0,I) 



where factor (7 «(1 +( In e)/ In (/:/«))^' comes close to one for 
large networks. Multiplication by the stored information per 
memory and dividing by the number of synapses gives the well 
known weight capacity of the WiUshaw model (see Sect. 
Mathematical Analysis II. 1), 



where the upper bound C*p = 0.69 bps can be reached for large 
networks, Peff = l, /'it =0.5, sparse activity k~logn, and zero 
input noise with X=l. 

In previous works on structural plasticity we have focused on 
synaptic pruning of silent synapses after learning all memories in a 
Jtilly connected network (Fig. 3C). Here we extend these results to 
networks with incomplete ("diluted") connectivity and ongoing 
structural plasticity. Let us first consider synaptic pruning which 
has been described as one of three phases during brain 
development (e.g., in humans, synaptic density increases until 
age of 2-3 years, then remains stable until 5 y, then decreases until 
puberty and remains relatively stable during adulthood; cf 
[14,51,77]; see also Fig. 7): 

1. Synaptic overgrowth: The synaptic generation rate is much 
larger than the elimination rate, Pg»Pe\s, such that anatomical 
connectivity P can come close to potential connectivity Ppot. 

2. Critical consolidation phase: Weight plasticity potentiates and 
consohdates useful synapses that support memory contents 
specified by the consohdation signal S, e.g., Pc\s = s, pd\s = l—s. 

3. Synaptic pruning: Useless synapses are eliminated, e.g., 
Pcio»Pg (cf Fig. 3C). 

Because only a fraction pi of the synapses survives phase three, 
the total storage capacity at maximal (where Pi=Pie) is 
obtained from renormalizing Eq. 14, 



(15) 



Using p\f from Eq. 13 reveals that C'°'~Idw for sufficiendy 
small cell assembly sizes k (see Sect. Mathematical Analysis II. 1). 
Thus, the WiUshaw model with structural plasticity comes close to 




P 



time t 



Figure 7. Sketch of network connectivity reflecting lifelong structural plasticity. During development anatomical connectivity P (thick 
solid) quickly increases reaching a peak level (around 2-3y in humans), where the initial increase is followed by a short period of stable connectivity 
(until age 5y in humans), a phase of significant decrease of connectivity until puberty, and finally a phase of stable connectivity during adulthood 
[14,51,77]. Recent experiments suggest a temporary novelty-driven (thick arrows) increase of connectivity during adulthood [23,68,1 16]. Our model of 
structural plasticity predicts that learning is fastest for high levels of anatomical connectivity and structural plasticity. Thus, memories acquired during 
early phases can reach higher levels of effectual connectivity (^'cff/^lfl-; thin solid lines) compared to memories acquired during later phases 
(Pj.Jp/'J.jJ). The resulting gradients in effectual connectivity can explain various memory phenomena (see Section 7 for details). 
doi:1 0.1 371/journal.pone.0096485.g007 



M,(Prff) * k — ^In (1 -Peff(l -pu)) In(l -pu)n (13) 

k\n^ 
ek 
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the information-theoretic capacity bound (Eq. 12). We have shown 
elsewhere that C'°' = Id n can be reached tightly with much weaker 
assumptions on cell assembly sizes and effectual connectivity by 
inhibitory implementations of the Willshaw model [46,78] and 
both excitatory and inhibitory implementations of Bayesian 
networks with discrete synaptic weights [53,54,79]. 

Unlike in development, during adulthood anatomical connec- 
tivity is stable. This means that ongoing generation and 
elimination of synapses must be in homeostatic balance such that 
the total number of synaptic connections remains approximately 
constant over time [14,80,81]. In the following we show that 
ongoing structural plasticity during adulthood can reach the same 
high storage capacity as during development, although this process 
may require significantly more time. The basic idea is that the 
three developmental processing phases (synaptic generation, 
consolidation, and elimination) run in parallel during each time 
step t. For example, by choosing the synapse parameters. 

Pc\s = s, W|i=0, pd\o>Q, and Pn{t)pg{t) = Po{t)Pe\s{t) (16) 

the anatomical connectivity P remains constant and, in essence, all 
actual synapses "migrate" to the locations ij specified by the 
consolidation signal Sij (cf Fig. 3D). IF S specifies a/^ memories to 
be stored, S is apphed during each time step, and the consolidation 
load P\s is sufficiently large such that P<PisPpot! THEM 
memories will be stored at effectual connectivity 
Pe{{ = P I P\s^Ppoi, there will be no silent synapses left, and the 
resulting total capacity C'°' is given by Eq. 15. In particular, for 
P = PlsPpot the resulting network will be identical as for 
developmental learning described before (see Fig. 3D and compare 
to Fig. 3C). This shows that also adult learning in structurally 
plastic networks with constant low anatomical connectivity can 
reach the information theoretic bound C'°' = ld n (see Eq. 12). 

In the following we apply our theory to networks with 
biologically relevant parameters. For example, a typical network 
size may correspond to a cortical macrocolumn of size 1 mm' 
containing about « = 1 0^ neurons and relatively dense recurrent 
connections with an anatomical connectivity of about P = O.I 
[24,25]. Then we can estimate potential connectivity Ppot from 
experimental measurements of the fillmg fraction P/Ppot defined as 
the fraction of potential synapses that is actually realized (i.e., in 
state 0 or state 1). For typical P/Ppot«0.2 [29], structural 
plasticity of dendritic spines alone may account already for 
Ppot ~ 0.5 within a neocortical macrocolumn. The corresponding 
storage capacities are depicted in Figure 5. Note that without 
structural plasticity (Peff = P = O.I) the storage capacity remains 
tiny, e.g., CP « 0.1 for P = 0.1. In particular, .sparse activity 
patterns [82] cannot be stored at a low connectivity, e.g., k<64 
requires P>0.1 to stabiUze even a single memory pattern. 

By contrast, networks employing structural plasticity with 
potential connectivity Ppot > 0.1 can have a large total capacity 
C'°'» 1. Interestingly, C'°' increases with decreasing connectivity. 
Thus, even slight increases of effectual connectivity towards 
^pot~0.5 can strongly increase number of stored memories (M) 
and even maximize stored information per synapse (C'"'). Note 
that an increase in Peff during consolidation would also allow a 
simultaneous decrease of activity k to maximize capacity. This 
means that consolidation involving structural plasticity and 
sparsification wiU move the "working point" from the lower right 
towards the upper left in the contour plots of Fig. 5. Thus, by 
emulating high effectual connectivity, structural plasticity may also 
support the sparsification of memory representations [82-85] and 



stabilize small cell assemblies that would appear unstable for a 
fixed low connectivity [86,87]. 

The following sections show that structural plasticit)', in addition 
to increasing storage capacity, can explain several well known 
memory phenomena in the brain much better than previous 
theories. 

7 Relevance of Structural Plasticity for IVlemory 
Phenomena 

7.1 Absence of Catastrophic Forgetting. Artificial neural 
networks such as multi-layer-perceptrons are well known to suffer 
from what was called catastrophic forgetting (CF) or the stabUity- 
plasticity dilemma [36,88—91]. It is the problem that optimizing 

synaptic weights to store a set of new memories will deteriorate or 
even destroy previous memories. Freezing synaptic weights can 
avoid CF, but it also hampers the ability to learn new memories. 

Another form of CF has been described for Hopfield-type 
network models of associative memory [92] . Here CF means that a 
neural network with fixed structure can almost perfectly store and 
retrieve memories until the maximal pattern capacity is 
reached. However, exceeding even by a few additional 
patterns can destroy the ability to retrieve any of the memories. 
The same problem occurs when increasing the number of stored 
memory patterns in the Willshaw-type binary learning models 
(Fig. 3A, B), even before the point where all synapses are uniformly 
potentiated and therefore have lost specific information about the 
memory patterns. 

CF poses problems for technical applications, but also for 
modeling memory processes because it does not normally occur in 
our brains. It has been argued that the capacity of the brain might 
just be too large for running into CF during a normal lifetime. In 
addition, several alternative solutions have been suggested. For 
example, many previous approaches suggested to have an 
additional hidden neural layer (e.g., between populations u and 
v) in which a new node is allocated for each new input that 
deviates significantly from previously stored items. The underlying 
idea is that in a modular organization, separate subnetworks 
(comprising different subsets of neuron in the intermediate layer) 
could be trained independentiy to represent different memories or 
categories. Such approaches include ART-type architectures [90], 
emergent category-specific modularity [93], hard- wired modular- 
ity [94], and also ideas involving grandmother cells [95] or, in 
technical terms, look-up-tables [16]. One problem with these 
approaches is that some high-level mechanism is required for 
allocating or even generating new neurons in the intermediate 
layer. However, in most parts of the adult brain, there is little 
evidence for structural plasticity involving neuron genesis. But 
without neurogenesis such models also predict catastrophic 
forgetting at a later time unless plasticity is expKcitly switched off 
after all neurons in the intermediate reservoir have been allocated. 
Alternative high level mechanisms for preventing CF involve 
pseudo-rehearsal using self-generated training stimuli from previ- 
ously learned memories [92] . In the following we are focusing on 
solutions to CF that can be built at the level of synapses. For 
example, palimpsests network models [96-98] assume a slow 
decay of synaptic weights {pd>0) to prevent approaching the 
network's capacity limit, however, are not plausible for long-term 
storage in neocortex. Similarly, synaptic cascade models [52] 
introduce several consolidated states 1*'^ with decreasing decay 
rates P^j >P^j^^^ ■ However, this cannot prevent exponential decay 
of memories unless the lowest decay rate is zero causing again CF. 

A novel role in preventing CF can be attributed to structural 
synaptic plasticity: Fig. 6A illustrates simulation experiments 
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investigating consolidation of multiple memory blocks each 
consisting of several novel memories. Each memory block is 
stored in the hippocampus and replayed to neocortical cell 
populations u and v for a certain time as described before (Fig. 2B, 
C). As expected, without any structural plasticity {pg=Pe = 0) the 
network exhibits CF when approaching the capacity limit (upper 
panel). In contrast, CF is absent in networks with structural 
plasticity (lower panel). In this case, early stored memories remain 
stable all the time whereas the ability to store novel memories 
fades gradually when approaching the capacity limit. This 
behavior is more consistent with aging effects of human memory 
[99] and results from the fraction of consolidated synapses steadily 
increasing with age and the number of stored memories. 
Correspondingly, the fraction of unconsolidated synapses partic- 
ipating in structural plasticity gradually decreases with age as 
observed in ncurojjhysiological experiments [21]. 

More precisely, for memories stored with a certain effectual 
connectivity Peff> structural plasticity can prevent CF only if the 
filling fraction is below the maximal fraction of consolidated 
synapses at the capacity limit, P / Ppoi<P\<:{Pdf) (see Eq. 13). This 
condition ensures that the total number of synapses, PrP', is smaller 
than the maximally allowed number of consolidated synapses, 
Pu(Pett)Ppotft^, at the network's capacity limit. If fulfilled, the 
network can never exceed its capacity limit which effectively 
prevents catastrophic forgetting. Brain networks could satisfy this 
condition by maintaining a constant (or slowly decreasing; cf. 
Fig. 7) anatomical connectivity P and by adapting cell assembly 
size k appropriately in relation to network size n and some target 
effectual connectivity /"eff- Thus, early memories can be consol- 
idated up to some target connectivity Peff which depends on the 
replay time per memory block. H()wc\Tr, at l(;ast if rc-pki)' time- per 
memory remains constant over lifetime, then for later memories 
Peff and pie{Pe{{) win decrease gradually with the decreasing 
fraction of available structurally plastic synapses, P — P\ (see 
Fig. 4B). Therefore, the ability to learn new memories will begin to 
fade when p\c{Pdf) approaches P/Ppot- 

7.2 Ribot gradients in retrograde amnesia. Patients with 
lesions of the hippocampus or neighboring neocortex in the medial 
temporal lobe often suffer from graded retrograde amnesia 
[38,40,100,101]. This form of memory loss shows characteristic 
"Ribot gradients" describing the tendency that recently stored 
memories are more likely to be lost than remote memories 
acquired at an earlier time. Simple palimpsests-type memory 
models (with pj > 0) cannot account for these findings, in fact they 
predict the reverse effect [96-98]. 

A body of previous work has proposed that the lesions may 
disrupt cortico-hippocampal memory replay and, as a result, 
recent memories disappear because they are not sufficiendy 
consolidated in intact neocortex [34,35,38,39,102-104]. Accord- 
ing to such models, the cause of Ribot gradients is a gradient in 
accumulated replay and consolidation time [102,104]. 

In one of the models [102], for example, replay is controlled by 
a random walk over the attractor-landscape in Hopfield-type 
networks where each stored memory corresponds to one of the 
attractors. After acquiring the //th memory, each memory obtains 
an 1/yU share of replay time. It is concluded that Ribot gradients 
occur because early memories (smaller p) can accumulate a larger 
total consolidation time of about X]^=;i1/j"i than recent 
memories, resulting in a larger strength of the memory trace. 

Such models predict either that memories would be replayed 
and consolidated for an unlimited time [102] or that Ribot 
gradients would occur only for memories acquired during a 
limited time interval before the lesion occurred [104]. Although 



there are not yet final experimental answers [34,105], both 
predictions may be in conflict with evidence that novel memories 
are buffered and replayed by the hippocampus for a limited time 
only [34,38,39] and that, depending on the lesion size, graded 
amnesia can reach back to early childhood [38] . 

Synaptic learning based on structural plasticity offers an 
alternative explanation for Ribot gradients without relying on 
unlimited memory replay (Fig. 6B). According to our model, the 
substrate of Ribot gradients are gradients in effectual connectivity 
Pcff instead of (or in addition to) gradients in accumulated 
consolidation time. Even with constant replay time per memory, 
remote memories are stored with a larger Peff than recent 
memories, for the very same reasons that explained the absence of 
catastrophic forgetting. Correspondingly, output noise e wiU be 
largest for most recent memories. During normal operation e is 
sufficiently low to accurately retrieve both remote and recent 
memories. However, cortical or hippocampal lesions will increase 
noise-levels such that memories get lost for which Pgff is below 
some critical value, or equivalendy, that have been stored after 
some critical time point. 

7.3 Spacing effect. Another interesting feature of memory is 
that learning new items is more effective if rehearsal is spaced over 
time compared to single block rehearsal [41-43,106]. For 
example, learning a list of vocabularies in two sessions each 
lasting 10 minutes turns out to be more effective than learning in a 
single session lasting 20 minutes. This so-called spacing effect is 
remarkably robust and occurs in many explicit and implicit 
memory tasks in humans and many animals being effective over 
many time scales from single days to months. 

Previous cognitive models attributed the spacing effect either to 
deficient processing of repeated items during single block rehearsal 
[107] or to improved storage by exploiting context variability 
between spaced rehearsal sessions [108]. T)rpically, these expla- 
nations presumed specific high-level structures and mechanisms of 
memory systems including attention, novelty, and context 
processing. Although detailed modeling of memory systems may 
be required to explain specific properties in particular memory 
tasks, the ubiquity of the spacing effect suggests a common 
underlying mechanism at the cellular level. We propose that 
structural plasticity in sparsely connected neural networks is such a 
mechanism. 

Figure 6C shows that structurally plastic networks reproduce the 
spacing effect naturally when learning a new set of memories in a 
similar protocol as described for the previous simulations (only 
here the memory replay should be interpreted more generally as 
rehearsal, not necessarily generated by the hippocampus). In the 
first simulation (blue) the memories are rehearsed in a single long 
time block, while in the second simulation (red) rehearsal is spaced 
over several shorter blocks such that total rehearsal time is equal 
for both simulations. For spaced rehearsal the resulting effectual 
connectivity Pcff of the memories turns out to be much higher 
and, correspondingly, the output noise e much lower than for 
single block rehearsal. 

Further simulation experiments (not shown) have indicated that 
the spacing effect induced by structural plasticity is very stable. 
Similar to the psychological experiments, it is remarkably difficult 
to find conditions without spacing effect. In essence, the spacing 
effect occurs if weight plasticity is faster than structural plasticity 
and if consolidated synapses are more stable than sik-nt synapses 
{Pe\0 >Pe\l)- Both properties are strongly supported by experiments 
[4,10,21,109]. In this case, our theory predicts that even in brief 
rehearsal sessions Hebbian plasticity can quickly consolidate all 
available synapses useful to store a set of memories. Thus, instead 
of continuing a rehearsal session, it is better to wait until structural 
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plasticity has grown additional useful synapses that can then be 
consolidated in a brief second rehearsal session. As a conspquence, 
spacing effects will necessarily occur whenever learning in the 
brain depends on structural plasticity. Interestingly, our model 
with structural plasticity can also quantitatively reproduce long- 
term spacing effects as recently observed in psychological 
experiments that investigated optimal spacing intervals to maxi- 
mize memory retention [1 10,1 1 1]. 

Discussion 

One important limitation in the brain seems to be the number 
or density of functional (non-silent) synapses, both for anatomical 
and metaboUc reasons. For example, the number of synapses per 
cortical volume is remarkably similar across difiTerent species [112], 
and theoretical considerations suggest that the energy consump- 
tion of the brain is dominated by the number of postsynaptic 
potentials or, equivalently, the number of functional non-silent 
synapses [47-49]. In face of these limitation, it might be beneficial 
that learning in brain circuits "moves" synapses to computation- 
ally useful locations [16,31,53,113]. 

To get a quantitative grip of these ideas we have introduced the 
concept of effectual connectivity, a macroscopic measure for how 
useful network structure is for memory storage. Structural 
plasticity can increase effectual connectivity while keeping the 
anatomical connectivity [P) at a low constant level. This has been 
analyzed for a simple model of structural plasticity assuming the 
following three basic mechanisms: (1) blind synaptogenesis, (2) 
consolidation of useful synapses, and (3) elimination of irrelevant 
synapses. Further, we have focused on the most plausible 
parameter range where structural plasticity (1,3) operates on a 
slower time scale than weight plasticity and consolidation (2), but 
the lifetime of consolidated synapses is long compared to the 
turnover of unstable synapses (see Section 2 and Section 5 for 
details; cf [4,10,21]). In our current model implementation we 
identify strong synapses with stable synapses (weight and state 1) as 
well as weak synapses with unstable synapses (weight and state 0). 
This contrasts with some experimental results suggesting that silent 
synapses could be quite stable [114] whereas even strong synapses 
could be eliminated, for example, during development [51]. Such 
findings may be explained by the probabilistic nature of state 
transitions in our synapse model or a dissociation between synaptic 
strength and stability, perhaps including a cascade of several 
different stable and unstable states [52]. 

Our model is applicable to learning during development, as well 
as during adulthood (Fig. 7). During development the three 
mechanisms appear to dominate different phases separated on a 
large time scale of years [14— 16,51,77,1 15]. StiU, on a smaller time 
scale of days or months [20,21,23], ongoing structural plasticity, 
involving the three mechanisms simultaneously, could control the 
anatomical connectivity to be approximately constant (see Eq. 16). 
Such homeostatic regulation of generation and elimination of 
synapses is even more evident during adulthood where the 
anatomical connectivity appears almost stable over several decades 
[14,51,77]. However, recent experiments demonstrate that there 
can be novelty-driven excursions from homeostatic balance on the 
time scale of several days in specific cortical areas of the adult 
brain, for example, during learning of motor memories 
[23,68,116]. This phenomenon can be understood within our 
modeling framework as a different control strategy of the 
anatomical connectivity, one which is driven by learning load. 
Specifically, in instances of high learning load, up-regulating the 
anatomical network connectivity is the means to achieve faster 
learning by increasing the number of unstable silent synapses that 



may be recruited into ne^\- memories by structural plasticity and 
consolidation. Taken together, the model can explain the major 
differences of structural plasticity during development and 
adulthood by shifts in how metabolic constraints and learning 
speed are leveraged. 

To simulate structural and weight plasticity we have used a 
simple three state Markov model of a potential synapse where state 
transition probabilities (with exception of Pg) depend on a 
Hebbian-t\ipe consolidation signal Sy (see Fig. 2A, Eq. 4). Our 
plasticity model generalizes the binary Willshaw model [26,44] 
and strongly simplifies realistic weight plasticity models, for 
example, those based on spike-timing dependent synaptic plasticity 
(STDP) where potentiation depends on the precise temporal order 
of presynaptic and postsynaptic spikes [117-119]. In fact, it has 
been discussed controversially whether STDP-type learning rules 
would at all be consistent with the Hebbian idea that "what fires 
together wires together" because, unlike the Willshaw model, 
simple STDP models predict decoupling of neurons firing at the 
same time [120-123]. However, we have recendy shown that 
more realistic STDP models (including dendritic propagation 
delays and parameters fitted to physiological data) are generally 
consistent with Hebbian learning and local cell assemblies [124]. 

Similarly, we argue that our model is also consistent with more 
realistic models of structural plasticity based on homeostatic 
mechanisms for maintaining mean neuronal firing rates at a 
constant level [20,50] . In such models, generation and elimination 
of synapses is induced by firing rates being below and above the 
homeostatic level, respectively. This is similar to our model with a 
homeostatic constraint for maintaining a constant anatomical 
connectivity P (sec Section 2), because the mean firing rate of a 
neuron (e.g., during phases of ongoing activity [125]) will strongly 
correlate with the number of synapses on its dendrite (cf [53,126]). 
Thus, keeping firing rates in homeostasis is essentially equivalent 
to maintaining the number of synapses per neuron and, thus, P, at 
a constant level. In our simulations, we hav<; explicitly adjusted the 
generation rate Pg in each step in order to keep P constant, but in 
a more realistic setting, Pg could as well be driven by factors 
representing each neuron's mean firing rate. 

Thus, we argue that both Hebbian and homeostatic structural 
plasticity are necessary to optimize information storage: Hebbian 
structural plasticity (via p^^^) is necessary to eliminate those 
synapses that are not useful for storing a memory set. But 
homeostatic structural plasticity (via Pg) is also necessary: First, to 
balance the requirements of fast learning (large P) and space and 
energy efficiency (low Pj. Second, homeostatic structural plasticity 
may also contribute to uniformly sample new memory representa- 
tions v** from the space of all possible activity patterns (with unit 
usages #{jU : vj' = I} being equal for aU neurons Vj), which is 
known to be optimal for minimizing output noise and maximizing 
storage capacity in multi-layer networks (see Fig. 7 in [127]; cf. 
[126,128,129]): For example, a neuron representing only a few 
memories wiU have few state- 1 synapses and, correspondingly, low 
firing rates. This may increase Pg to generate new state-0 synapses, 
rendering this neuron more plastic and receptive for being used to 
represent new memories, thereby increasing state- 1 synapse 
number and firing rates until the desired homeostatic level is 
reached. Some previous works have actually argued that non- 
Hebbian homeostatic structural plasticity could be sufficient to 
explain memory formation [18,130]. Although this may hold true 
if cell assemblies representing cUfferent memories would be 
spatially separated with only litde overlap, our results emphasize 
also the need of Hebbian-type structural plasticity with a specific 
elimination of unconsolidated synapses. Without Hebbian struc- 
tural plasticity it seems impossible to stabilize a larger number of 
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overlapping cell assemblies and to come close to the high memory 
capacity of our model [16]. 

By introducing the concepts of effectual connectivity Peff and 
consolidation signal S, our theor)' remains largely independent of 
a specific underlying neural network model of memory. In fact, the 
performance of the specific model in terms of output noise £ is 
generally a non-linear monotonic function / of effectual connec- 
tivity, e.g., £=/(-Peff), where / depends on the network model, 
network size, number of active units per memory vector, number 
of stored memories, and other factors. Here we have investigated 
Willshaw-type networks with binary synapses [16,26,44] because 
they give a simple and intuitive answer to the question which 
synapses are irrelevant and thus eligible for pruning. However, the 
efficiency of structural plasticity generalizes to learning employing 
graded synaptic states [53,54,79]. Previous approaches to memory 
formation by structural plasticity have also discussed that 
memories could be encoded in the number of synapses rather 
than by changing weights of individual synapses [28] . 

There are several Unes of evidence suggesting that the binary 
weight model (corresponding to states 0 and 1) is already quite 
useful, in particular, if one would add suitable noise terms to 
account for distributed synaptic strength: First, experiments 
indicate that real synapses may have only a small number of 
functionally distinctive states or may even be binary [74—76,131]. 
Second, real synapses tend to scale their strengths such that in the 
soma (where spikes are generated) the resulting postsynaptic 
potentials have a relatively constant amplitude [61]. Third, 
anatomical experiments have shown that the number of real 
synapses per connected neuron pair is relatively constant in 
cortical areas [59] which indicates active regulation, for example, 
based on spike correlations [132,133]. Together, these findings 
support the hypothesis that the number of synapses per neuron 
pair and the strength of .synapses at different dendritic locations 
might be co-regulated in order to keep the effect of a neuron onto 
a connected neighbor close to a desired constant magnitude. From a 
functional viewpoint, this perfectiy makes sense at least for some 
functions such as memory storage (or the storage of "random" 
memory indices [134]) where binary synapses are optimal for 
storing sparse neural activity patterns [46,53,73]. 

Although our definition of effectual connectivity Peff is tailored 
for the analysis of structural plasticity and memory storage, it 
shares many features with previous definitions of effective 
connectivity, e.g., based on "Granger causality" or "transfer 
entropy" used for analyzing the functional structure of brain 
networks from measured neural activity [135-137]. For example, 
transfer entropy Tu^v [137] is a measure of the directional 
information flow from one brain area u to another area v. In the 
simplest case the transfer entropy between activities u{t) and v(t) 
measured in two brain areas u and v is defined as 

p{v{t+l)\v{t)) 

denotes the distribution of activity patterns, see Eq. 4 in [137] for 
details. This measure is very similar to the transinformation-based 
capacity measure C^(Peff) (see Eqs. 10,14) which depends 
monotonicaUy on Pjff rendering effectual connectivity an equiv- 
alent measure of how well an input activity pattern u'' in one area 
can reactivate a corresponding target pattern in another area. 
In fact, the equivalence of the two measures, P„_,v ~ C"'P(Peff), 
can be shown for a simplified model of neural activity propagation 
in brain areas [138]. 

Adding to previous results of storage capacity based on counting 
possible synaptic network configurations [28-30] (cf Eq. 12), our 
model proves that simple memory networks of n neurons with 
structural plasticity can indeed store and retrieve up to C'"' ~ log n 



bits per synapse. By comparison, even with real-valued synapses 
that have an infinite number of states, Hebbian-type weight 
plasticity without structural plasticity achieves less than one bit per 
synapse [72,73,139,140]. Technical adaptations of our model to 
applications such as information storage and pattern recognition 
have exhibited advantages in terms of recognition time and 
memory requirements compared to methods based on traditional 
weight plasticity [16,53,127]. 

Besides increasing storage capacity and energ)- efficiency of 
neural networks, our results suggest that structural plasticity is a 
key element in understanding various memory phenomena. One 
key prediction of the model under homeostatic maintenance of 
anatomical connectivity P are time-dependent gradients in 
effectual connectivity Peff, such that memories from an earlier 
time have higher Peff than memories from a later time. These 
gradients occur because consolidation of an increasing number of 
memories will continuously decrease the number of "migratable" 
(not yet consolidated) synapses and, thus, learning of new 
memories becomes slower and slower. We have shown that such 
gradients in Pgff can explain both aging effects and the absence of 
catastrophic forgetting because learning may stop just before the 
number of stored memories reaches the critical capacity limit 
[31,36,99]. The same gradients in Pgff can also explain Ribot 
gradients in amnesic patients suffering from lesions of the medio- 
temporcd lobe [38-40]. Ribot gradients can also be explained by 
gradients in accumulated consolidation time, assuming unlimited 
cortico-hippocampal consolidation [102,104]. However, our 
model is unique in producing Ribot gradients even for finite 
consolidation times, in accordance with findings of a time-limited 
role of the hippocampal system in consolidation [34,38,39]. 

Last, our model is able to bridge different models, describing the 
spacing effect [43] on psychological [41,42,106] and molecular 
levels [141] by identifying structural synaptic plasticity as the 
potential cellular mechanism for spacing effects. The presence of 
structural plasticity in the adult brain is not only strongly 
supported by recent experimental evidence. As our results show, 
it is necessary to achieve high storage capacity and energy 
efficiency, and inevitably causes spacing effects. Structural 
plasticity is consistent with psychological theories that explained 
the spacing effect by encoding variability [106,108] but attributes 
the increased variability for spaced rehearsal to the changing 
pattern of synaptic connections rather than a changing learning 
context. While previous models based on delayed synaptic 
consolidation induced by molecular signaling cascades [52,141] 
may account for short-term spacing effects on the time-scale of 
minutes, structural plasticity can also explain long-term spacing 
effects on the time scale of months to years [110,111]. As the 
temporal profile of optimal learning depends on parameters of 
structural plasticity, predictions from theories of structural 
plasticity will be testable by future experiments that can link 
memory performance (behavioral data) and structural plasticity 
(physiological data) in cortical areas where these memories are 
stored. 

Mathematical Analysis 

I Temporal Dynamics of Effectual Connectivity Peff 

I.l Relation between synapse and network states. As 

will be shown, effectual connectivity Peff is a macroscopic network 
state that can be computed from the (microscopic) states of 
individual potential synapses. For this we first have to describe the 
relation between microscopic synaptic state variables PstateCO (Eq. 
4) and the corresponding macroscopic connectivity variables 
Pstate(0- As indicated in the main text this relation is non-trivial 
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(see text below Eq. 4), because there may be multiple actual and/ 
or potential synapses between each neuron pair ij, whereas 
connectivity of a neuron pair ij has to be defined in terms of the 
presence of at least one synapse or the absence of all synapses. For 
example, we could define neuron pair ij to be in state 1 if there is 
at least one potential synapse if that is in state 1. Similarly, we 
define that state(i7) = 0 ifFstateCy) ^ 1 and there is at least one real 
synapse with state(//') = 0. Finally, state(//) = 3i iflFstateC!}') ^ {0,1} 
and there is at least one potential synapse with state(//'') = n. 

Next we divide neuron pairs into distinct groups, where two 
neuron pairs are in the same group if they receive identical 
consolidation signals s(t). Then, in analogy to Eq. 4 we can define 

the ( macroscopic) fractions ^^st^te of neuron pairs ij belonging to 
group s = Sij and being in a certain state e{7i,0,l}, 

<» , , 
i'<,"=CEP(")(l-(l-M')") (17) 

1 

^S;"=CEp(")(M'') (18) 

n= 1 

4''=4o.-^f'-^" (19) 

where Pp^j is the fraction of neuron pairs that have a potential 
synapse and receive consolidation signal Sij = s (typically 

(s) (s) 

Ppot = PpotPis if the matrix of potential connections is indepen- 
dent of the stored memories), and p(n) is the probability that there 
are exacdy n potential synapses given that there is at least one 
potential synapse for neuron pair ij. See ref. [59] for neuroana- 
tomical estimates of p(Tl) in various cortical areas. 

From this we can compute the macroscopic state variables Pstate 
defined as the fractions of neuron pairs that are in a particular 
state e{ J0,7i,O,l} (where state ^ denotes neuron pairs without any 
potential synapses) and the various connectivity measures defined 
in Section 1, 

Atate(0= E ^^tateW for State e{ J3,7i,0,l} (20) 



P(t) = Po(t) + Pi(t) (21) 



Ppot(t) = Pn(t) + Po(t) + Pl(t) (22) 



corresponding connectivity measures (e.g., as we have done for 
Fig. 6; see also Section 1). 

While we have worked out a general theoretical framework of 
structural plasticity [142], the following analyses will be limited to 
the much simpler case where a neuron pair has at most one 
synapse, p(n)=l. Such a setting is justified by experimental 
findings that there is an active regulation of the total connection 
strength of the synapses connecting two neurons towards a 
constant value (see discussion section). 

1.2 Increase of Pelf towards Ppot- To prove Eq. 1 1 let us 
now analyze the temporal dynamics of effectual connectivity Peff 
under simplified conditions. Specifically, we analyze the increase 
of Peff towards Ppot during consohdation in a large network with 
constant anatomical connectivity P having at most a single potential 
synapse per neuron pair. For this we will assume a simple constant 
consolidation signal, i.e., ongoing rehearsal or replay with 
s{t) = 5'/,e{0, 1 } for / > 0. Constant P requires a homeostatic constraint 
where generation and elimination of synapses are in approximate 
balance, 

^^(^)^P.|oH°'(^-l)+P.|tn'V-l)^ (25) 

where PJJ is as defined in Sect. Mathematical Analysis L 1 . 
Furthermore, we assume Pc\s = s£{OA} and sufficientiy large 
neuron populations u and v with sizes n » 1 (cf Fig. 3) such that 
PeffCO and Po(0 (and Pi(/) = P — Po(0) '^''e always close to then- 
expectations. Thus, at any point in time, there exist Pn^ synapses 
distributed over Ppot«^ possible locations. Before learning starts, 
the network has already Pi (0)«^ consolidated synapses (e.g., due to 
earlier learned memories) that are unrelated to the novel memories 
specified by S,y. Thus, initially Peff(0) = Pi(0) (Eq. 24). After the 
first learning step at ? = 1 all available synapses get potentiated and 
consohdated, Peff(l) = P. For t>l it is 



Peff(0=Ppoi(i-,n(i-||;j)) 

where G{t) is the number of new synapses generated at time t 
(which equals the number of eliminated synapses), L(t) is the 
number of potential locations to put them, and 11 (1 — G/L) is the 
probability that a given potential synapse ij with Sy = 1 is not yet 
realized and consolidated untU time t. For ? = 1 we can assume 
G(1) = Pm2 and L(l) = Ppoi«2. For t>l it is G(0=P<.|oPo(?- 1)"^ 
and L(t) = PpotH^ — Pn^ + G{t), where the number of unconsoli- 
dated synapses, Po«^, computes from 



^l^=E E ^'^ate(0 (23) 
s¥'0 state<E{ ^,71,0,1} 

P.nit)='^ . (24) 

Pis 

By these definitions we are in the position to do microscopic 
simulations of networks of potential synapses and compute the 



Po(0«P-(l-Pis)Pi(0)(l-;7rf|o)'-PisPeff(0- 

i.e., all real synapses minus initially consolidated (and not yet 
deconsohdated) synapses minus the newly consolidated synapses 
marked by S. Thus, the factors in the product become 
1 - G{ti )/P(/i ) « (Ppot - P)/(Ppot - P+A.|oPo('i - 1 ). Therefore 

-^/^-^ qi l+.,oPo(/l-l)/(Ppot-P) ) 
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proving Eq. 1 1 . The second approximation in Eq. 1 1 becomes 
valid if all product terms are approximately equal, i.e., if Pis«l 
(set of novel memories is small) and tpd\o « 1 (deconsolidation 
during the time interval of rehearsal or replay is negligible). Note 
that here the increase of Peff(0 does not depend on pj^i since 
synapses with 5*;, = 1 that get deconsolidated are immediately 
= 1) reconsohdated. 



relevant synapses of neuron J are potentiated, where the 
probability of one synapse being potentiated is pi. An exact 
analysis shows that this binomial approximation of qoi becomes 
exact in the limit of large networks and sufficiently small cell 
assemblies with k=0(n/log^ n) (see [129]; see also Section II.2). 

Now we can compute the storage capacity by limiting output 
noise e (Eq. 7) by some constant £ > 0. Thus, we have to solve 



II Evaluation of Memory Capacity 

II. 1 Asymptotic analysis for one-step retrieval. As argued 
in Section 6, the storage capacity of structurally plastic networks 
where memories are stored with effectual connectivity Peff is 
equivalent to the capacity of a structurally static network with 

increased anatomical c()nnccti\'ity P = Pc[[ (cf- Fig- 3). Tlu'rc'fore 
the following computes the storage capacity for one-step retrieval 
in the Willshaw network without any structural plasticity 
{Pe\s=Pg=Pd\s = 0, Pc\s = s\ see Section 3 and Fig. 3 A) where 
synaptic weights are given by Eq. 5. 

For the following approximate asymptotic analysis we use several 
simplifications. First, Address and content memory patterns are 
binary random vectors of size n each having k active units (i.e., k is the 
size of a Hebbian cell assembly representing the memory in 
population u or v). Second, the The query pattern u has c : =Xk 
randomly chosen "correct" one-entries of an address pattern m'' 
(where 0</,< 1) but no additional "false" one-entries (/^ : =Kk = 0). 
Third, as previously suggested [78,143-145], we assume that each 
neuron j can optimize its firing threshold ©/ : = c'(/) according to the 
number of connected active "correct" query neurons, that is, 
c'(j) : =#{i -.111 = 1 and state((7)6{0,l}}. 

Let us first estimate error probabilities after storing M 
associations. We have q\o = 0 due to the assumptions of optimal 
threshold control and zero add noise (k = 0). To see this note that 
Wy = 1 for any actual synapse ij with = 1 (which implies t^ = l 
due to the zero add noise assumption) and v*" = 1 . Therefore the 
dendritic potential Xj will equal 0y = c'(J) and thus Vj = l if vj" = 1. 
By contrast, qm depends on the probability pi that a given synapse 
is potentiated (see Eqs. 4, 5). After storing M memory associations 
we have 



(n—k)qoi <ek 



(28) 



Pi = l- 



«2 



(26) 



This follows from the fact that a synapse is potentiated with 
probability k/n during presentation of a single memory. After 
presentation of all M memories, the synapse will therefore stiU be 
in state 0 (unpotentiated) with probability po = (l—k/n^)'^ . The 
state probability pi has been called "memory load" or "matrix 
load" in previous works [16] because, for fully connected 
networks, pi corresponds to the fraction of one-entries in the 
weight matrix. From Eq. 26 we obtain that a "low neuron" _/ with 
Vj=0 may fire with error probability 



J2PBic';c,P)pi = il-P(l-pi)y 



(27) 



where Pb(x;N,P) : = C^JP^il -Pf'"" is tiie binomial probabil- 
ity. Note that c'(J) foUows a binomial distribution such that 
pr[c'(/) = ci] =Pb{ci;c,P). Thus, the sum in Eq. 27 averages over 
all possible values of d where the error probability given c' is 
pr[vy= l|vj' = 0] =p\ . This is because an error requires that aU c' 



for pi which gives the maximal matrix load pu of Eq. 1 3 that 
satisfies e<e. With this, solving Eq. 26 for M yields the pattern 
capacity of Eq. 13. For small f and k/n it is 

r« —{k/n)ld(k/n) and with Eq. 10 it follows the weight capacity 
Eq. 14. 

For networks with structural plasticity Eq. 13 is stiU vaUd but 
effectual connectivity will be typically larger than anatomical 

connectivity, Pet'f > P- As silent .synapses are functionally irrelevant 
and can be pruned (but see the remarks below) we can compute 
total storage capacit)' in bits per synapse from renormalizing Eq. 
14. Thus, dividing the totally stored information by PeiiPun'^ 
instead of Peitn^ yields 



-ttOt . 



Pii 



ld(l-feff(l-Pi.))ln(l-pu) 

PuPetC 



n- (29) 



For large Peff^l and small pu = {tl/{n-k)Y^Q the total 
storage capacity per synapse diverges with network size n, 



Id- 

cft'~-mpu-^ 



(30) 



Together with Eq. 11 this proves that in networks with 
structural plasticity, high potential connectivity, and sufficientiy 
small cell assembly size k, it is possible to come close to the 
information theoretic capacity bound (see Eq. 12). 

One limitation of this analysis is the assumption of an optimal 
threshold control. In fact, an optimal threshold control as 
presumed above would actually require silent synapses in order 
to compute spike thresholds &j = c'(J) in incompletely connected 
excitatory networks with Pet'f < 1 [143,144] (so they should not be 
pruned). Therefore we will use the resulting expressions for C'°' 
merely for approximating the storage capacity for a more 
conservative threshold control (see next section). Nevertheless the 
results are still asymptotically correct for high effectual connectiv- 
ity Peff-*! because then the optimal spike threshold c'(/) = c gets 
independent of remaining silent synapses [16]. Corresponding 
results hold true also for inliiliitory network models where an 
optimal spike threshold control could easily be realized (including 
pruning of silent synapses) because it is independent of c'(/) for any 
Peff [78]. This suggests that structural plasticity could store 
information in inhibitory networks even more efficientiy than in 
excitatory networks (cf. [13]). 

II. 2 Numerical evaluation for finite networks. The 
analysis of the previous section is asymptotically correct for large 
networks (m-»oo), large connectivity (Peff->1)) and sparse activity 
(k= 0(n/ log^ n)) [16,129]. It is also useful to get an overview 
about the qualitative effect of increasing effectual connectivity Peff 
and its relation to the memory load pi. To compute storage 
capacity of finite networks with large activity k and low 
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connectivity -Peff it is possible to do an exact analysis by 
generalizing the approach of [1 29] . However, as such an approach 
would be computationally very expensive, the following develops a 
Gaussian approximation of dendritic potential distributions, which 
can reduce reduce computation time by several orders of 
magnitude. For example, in some preliminary experiments we 
have evaluated the exact storage capacity =25005 for «= 10^, 
k = 724, Peff = 0.5 for A=1,K = 0,£ = 0.01 which took about 57 h 
on a single core of an 2.2 GHz AMD Optercjn compute server. By 
comparison, using the Gaussian approximation developed in this 
section yields A/j =24851, quite close to the exact value, but took 
only 2.5 sec computing time. 

Let us first consider the WiUshaw-Palm distribution 
pPhiX; k,n,M,P,z) defined as the exact probability that a content 
neuron's dendritic potential Xj equals X given that M random 
memories are stored in a heteroassociative WiUshaw-Palm network 
with size n, anatomical connectivity P, and (constant) activity k if 
stimulating with a random pattern (unrelated to the stored 
memories) with z active units. From Eq. 3.22 in [129] we obtain 
pph for the special case of fully connected networks {P= 1), 



(I-Pi)' 



(31) 



(1- -{l-B(m,k,s+z-xW 
n 



where B(a,b,c) 



(a — b — i)/(a—i) = 



B(a,c,b). In network with general connectivity P each of the z 
active input units is connected to neuron j with probability P. 
Therefore the number of connected neurons is binomially 
distributed and 



can be computed from the corresponding variance of a fuUy connect- 
ed network which is well approximated by (see Eq. 4.25 in [129]) 



(37) 



„(2) 



where po : = \ —p\ (cf Eq. 26) and p^ 
(l-(k^/n^}(2-k/n))^. Therefore tiie variance of the diluted 
network is well approximated by 

4h *zni -po)-zP\l-2po +pf*)+z^P^(p'i^ -pI) (38) 



From these results we can easily compute mean values and 
variances of the dendritic potential distributions of high and low 
units. Here high units are neurons Vj with vj" = 1, i.e., neurons that 
should be activated during retrieval. Similarly, low units are 
neurons Vj with Vy" = 0. Thus, if the query pattern ii has exactly c 
correct units from an address memory k'' and additionally / 
randomly chosen false units (not active in «'') then the mean and 
variance of a low unit's dendritic potential will be 



i"lo = (c+/)-Pj'i 



4 = (c+/)P(l-j7o)- 



(39) 



(40) 



(C+/)P2(1 -2J70 +pf) + {c+ffP'iP? -pI) 
and mean and variance of a high unit's dendritic potential will be 



pPhCA-; k,n,M,P,z) = ^ z,P)PFh(X; k,n,M, 1 ,z') (32) 

z' 

We can now determine the first two moments of this distribution. The 
mean E{xj) can easily be computed from the memory load Eq. 26, 



4 = cP(l-P) + 



Pi,i = cP+fPpi 



-po)-fP'il -2p,+pf)+fp^{p^^> -p^) 



r2p2/-„(2). 



(41) 



(42) 



A^Ph : =E{xj) = zPp). (33) 

and the variance 



: = E{{xj - npi^f) = XI - J"Ph)Vph(^; k,n,M,P,z) (34) 



z z 

= Y,Pb{z'\z,P) X (X-ti^^fpf^{X-k,n,M,lX) (35) 



Assuming Gaussian distributions we can compute a globally 
optimal firing threshold 0 that minimizes output noise £ by 

applying some standard mc-tliods (e.g., see appendix D in [46]). 
Then we can determine pattern capacity by doing a binary 
search to efficientiy find the maximal M that satisfies e<£. Finally, 
we can determine C™p from Eq. 10 and thus also p\^ from Eq. 26 
and CJ"' : = Cy^/pi^. Corresponding data for «= 10^ is shown in 
Fig. 5. 
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